Supervised Semantic Indexing for Ranking Documents
نویسندگان
چکیده
Ranking text documents given a query is one of the key tasks in information retrieval. Typical solutions include classical vector space models using weighted word counts and the cosine similarity (TFIDF) with no machine learning at all, or Latent Semantic Indexing (LSI) using unsupervised learning to learn a low dimensional space of “latent concepts” via a reconstruction objective. The former assumes independence of words and cannot capture synonymy or polysemy, whilst the latter is still agnostic to the actual task of interest.
منابع مشابه
Role of semantic indexing for text classification
The Vector Space Model (VSM) of text representation suffers a number of limitations for text classification. Firstly, the VSM is based on the Bag-Of-Words (BOW) assumption where terms from the indexing vocabulary are treated independently of one another. However, the expressiveness of natural language means that lexically different terms often have related or even identical meanings. Thus, fail...
متن کاملEnriching Ontologies with Encyclopedic Background Knowledge for Document Indexing
The rapidly increasing number of scientific documents available publicly on the Internet creates the challenge of efficiently organizing and indexing these documents. Due to the time consuming and tedious nature of manual classification and indexing, there is a need for better methods to automate this process. This thesis proposes an approach which leverages encyclopedic background knowledge fo...
متن کاملEfficient semantic indexing via neural networks with dynamic supervised feedback
We describe a portable system for e cient semantic indexing of documents via neural networks with dynamic supervised feedback. We initially represent each document as a modified TF-IDF sparse vector and then apply a learned mapping to a compact embedding space. This mapping is produced by a shallow neural network which learns a latent representation for the textual graph linking words to nearby...
متن کاملSupervised Locality Preserving Indexing for Text Categorization
A major characteristic of text categorization problems is the prohibitive high dimensionality of the feature space. Most discrimination methods can not work in such a condition, Latent Semantic Indexing (LSI) has been adopted to solve this problem. However, LSI is not an optimal representation for text categorization task mainly because of two reasons: first, the discriminative categorical info...
متن کاملAn Enhanced Indexing And Ranking Technique On The Semantic Web
With the fast growth of the Internet, more and more information is available on the Web. The Semantic Web has many features which cannot be handled by using the traditional search engines. It extracts metadata for each discovered Web documents in RDF or OWL formats, and computes relations between documents. We proposed a hybrid indexing and ranking technique for the Semantic Web which finds rel...
متن کامل